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Abstract 

This paper examines the discourse func- 
tions that different types of subjects 
perform in ItaUan within the centering 
framework ( Grosz ct al., 199£). I build 
on my previous work (Di Eugenio, 199C) 
that accounted for the alternation of null 
and strong pronouns in subject position. 
I extend my previous analysis in several 
ways: for example, I refine the notion 
of CONTINUE and discuss the centering 
functions of full NPs. 



1 Introduction 

Interpreting referential expressions is important 
for any large coverage NL system; while such sys- 
tems do exist for Italian, e.g. (3tock et al., 1993; 



Lombardo and Lesmo, 1994), to my knowledge 



not much attention has been devoted to the inter- 
pretation of Italian referential expressions. Some 
excep tions are ( ^amek-Lodovici and Strapparava^ 



1 



1990), that discusses interpretation of referential 



expressions with in dialogues to access a vid eodisc 
on Italian art; ( Not and Zancanaro, 1995), that 



adopts a systemic grammar approach (Halliday, 



1976| ); and (|Di Eugenio, 1990|) , which uses center- 



ing theory ( Grosz ct al., 199q ) to account for the 
alternation of null and strong subjects. 



In this paper, I build on and expand (Di Eu- 
genio, 1990) in several ways. First, I reanalyze 



the hypotheses I proposed earlier with respect to 
a corpus of naturally occurring dataj^ I show 
that those hypotheses are basically supported; 



^The examples in (Di Eugenio, 1990) were 
constructed. 



and that further insight can be gained by looking 
at a two member sequence of centering transitions 
rather than at just one transition. Second, I ex- 
tend my previous analysis by also discussing the 
centering functions of full NPs in subject position, 
and some occurrences of pronouns unaccounted 
for by centering. 

2 Centering theory 



Centering theory (prosz et al., 1986 ; Brcnnan et 
al., 1987; Grosz et al., 1995) models local coher- 



ence in discourse: it keeps track of how local focus 
varies from one utterance to the next. Centering 
postulates thatj^ 

• Each utterance U„ has associated with it a set 
of discourse entities, the forward-looking cen- 
ters or Cfs. The Cf list is ranked according to 
discourse salience. 

• The BACKWARD-LOOKING CENTER, Or Cb, is the 

member of the Cf list that U„ most centrally con- 
cerns, and that links U„ to the previous discourse. 

• Finally, the preferred center, or Cp, is the 
highest ranked member of the Cf list. The Cp rep- 
resents a prediction about the Cb of the following 
utterance. 

Transitions between two adjacent utterances 
U„_i and Vn can be characterized as a function of 
looking backward — whether Cb(U„) is the same 
as Cb(U„_i) — and of looking forward — whether 
Cb(U„) is the same as Cp(U„). Table illustrates 
the four transitions that are defined according to 
these constraints. ( Brennan et al., 1987D proposes 
a default ordering on transitions which correlates 
with discourse coherence: continue is preferred 
to RETAIN is preferred to SMOOTH-SHIFT is pre- 



The version of c entering I present here is from 
(Brennan et al., 1987). 





Cb(U„) = Cb(U„_i) 


Cb(U„) + Cb(U„_i) 


Cb(U„) = Cp(U„) 


CONTINUE 


SMOOTH-SHIFT 


Cb(U„) ^ Cp(U„) 


RETAIN 


ROUGH-SHIFT 



Table 1: Centering Transitions 



ferred to rough-SHIFT.| 

The saliency ordering on the Cf Ust, which is 
generaUy equated with grammatical function, for 
Western languages is subject > OBJECt2 > ob- 
ject > OTHERS, where others includes preposi- 
tional phrases and adjuncts. ( Kameyama, 1985| ) 



was the first to point out that for languages such 
as Japanese empathy and topic marking affect the 
Cf ordering, and proposed the following ranking 

(1) empathy > subject > OBJECt2 > 

OBJECT > others 



I follow ( Turan, 1995 ) in adopting (|l|) also for 
Western languages. Turan argues that a notion 
analogous to empathy arises in Western languages 
as well: e.g. with perception verbs, it is the expe- 
riencer, which is often in object position, rather 
than the grammatical subject, that should be 
ranked higher. 

Finally, centering provides an interesting frame- 
work for studying the functions of pronouns, as 
the observation that the Cb is often deleted or 
pronominalized can be stated as the following rule: 

Rule 1 // some element of Cf(Un^i) is realized 
as a pronoun in [/„, then so is Cb(Un). 

This rule has been computationally interpreted 
to individuate the Cb. If U„ has: 

• a single pronomi, that is Cb(U„); 

• zero or more than one pronoun, Cb(U„) is: 

— Cb(U„_i) if Cb(U„_i) is realized in U„; 

— otherwise the highest ranked Cf(U„_i) 
which is realized in U„. 

Let's apply centering to the constructed exam- 
ple in (^). In (||a) Cb = ? because the Cb of 
a segment initial utterance is left unspecified; in 
(^j) the Cb is John, as it is the only pronoun, and 
also the only entity belonging to the Cf list of (^a) 
realized in (^d). 

(2a) John is a nice guy. 
Cb = ? Cf = [John] 



(2b) He met Mary yesterday. 

Cb = John, Cf = [John > Mary] 

(2c) i. He likes her. (continue) 

Cb = John, Cf = [John > Mary] 
u. She likes him. (retain) 

Cb = John, Cf = [Mary > John] 
ui. She was with Lucy. (smooth-SHIFt) 

Cb = Mary, Cf = [Mary > Lucy] 
iv. Lucy was with her. (rough-SHIFt) 
Cb = Mary, Cf = [Lucy > Mary] 

In (||c).i we have a continue, as its Cb is John 
(the highest entity on the Cf list of (||b)), and so is 
its Cp. In (Hc).ii, the Cb is still John as in (||c).i, 
but the Cp now is Mary, thus we have a retain. 
In both (||c).ui and ^).iv the Cb is Mary (the 
only entity belonging to the Cf list in (0b) that 
is realized): as Mary is also the Cp in (^).iii, a 
SMOOTH-SHIFT occurs. Instead, as Lucy is the Cp 
in (||c).iv, a rough-SHIFT occurs. 

Centering theory has appealing traits from both 
cognitive and computational points of view. From 
a cognitive perspective, it explains certain phe- 
nomena of local discourse coherence (e.g. pronom- 
inal "garden paths"), and is supported by psy- 
cholinguistic experiments ( pordon et al., 1993| ). 
Computationally, it is a simple mechanism, and 
thus it has been the basis for simple algorithms 



for anaphora resolution (Brcnnan et al., 1987). 

Much work still remains to be done on cen- 
tering. For example, most development so far 
has been based on simple constructed examples: 
to apply centering to real text, issues such as 
how possessives and subordinate clauses affect re- 
ferring expression resolution must be addressed. 
This paper is a contribution in that direction. 

3 The Italian pronominal system 



Italian has two pronominal systems (Calabrese 



1986): weak pronouns, that must always be cliti- 



( |Grosz ct al., 1986 ; Srosz et aL, 199E) propose 



that the ordering on transitions pertains to sequences 
of transitions rather than to single transitions. 



cized to the verb (e.g. lo, le, gli - respectively him, 
accusative; them, feminine accusative or her, da- 
tive; him, dative), and strong pronouns (lui, lei, 
lore - respectively he or him; she or her; they or 
them). I The null subject is considered part of the 

''Lui, lei, lore are the oblique forms of the strong 
system, while the nominative forms are respectively 



system of weak pronouns. 

Weak and strong pronouns are often in comple- 
mentary distribution, as strong pronouns have to 
be used in prepositional phrases, e.g. per lui, for 
him. However, this syntactic alternation doesn't 
apply in subject position. The choice of null ver- 
sus strong pronoun depends on pragmatic factors; 



the centering explanation offered in (Di Eugcnio 
l990| ) goes as follows: 



(3a) Typically, a null subject signals a continue, 
and a strong pronoun a RETAIN or a shift. 

(3b) A null subject can be felicitously used in 
cases of retain or shift if in [/„ the syn- 
tactic context up to and including the ver- 
bal form(s) carrying tense and / or agreement 
forces the null subject to refer to a particular 
referent and not to Cb(C/„-i). 



The evidence for (||b) provided in ( Di Euge- 
QIO, 19901 ) derived, among others, from modal and 



control verb constructions, in which clitics may 
be cliticized to the infinitival complement of the 
higher verb or may climb in front of the higher 
verb. When the clitic climbs, certain pronominal 
"garden path" effects, deriving from a wrong in- 
terpretation initially assigned to the null subject 
and later retracted, are avoided. 

4 Italian subjects in discourse 
4.1 The corpus 

The corpus amounts to about 25 pages of text, 
and 12,000 words; it is composed of excerpts from 
two b ooks (von Arnim, 198E ; Fallaci, 1989 ), a 
letter ( Mila, 1993), a p osting on the It alian bul- 
letin board (^CI, 1994D , a short story QNichetti 



1993 ), and th ree articles from two newspapers ( del 



Buono, 1993; Pagetti, 1993; La Nazione, 1994) 



The excerpts are of different lengths, with the ex- 
cerpts from the two books being the longest. 

Texts were chosen to cover a variety of contem- 
porary written Italian prose, from formal (news- 
paper articles about politics and literature) , to in- 
formal (posting on the Italian bulletin board), and 
according to the following criteria: a) minimal di- 
rect speech, which has not been addressed in cen- 
tering yet; b) prose that describes situations in- 
volving several animate referents, because strong 
pronouns can refer only to animate referents. 

Table ^ shows the distribution of animate third 
person subjects partitioned into: full NPs — 

egli, ella, essi/e: in current Italian the latter forms are 
rarely used as the oblique forms have replaced them in 
subject position — in my corpus there are only four 
occurrences of these nom inative forms, and they all 
occur in the same article (Pagetti, 199S). 



the numbers in parentheses refer to possessive 
NPs; strong pronouns; null subjects — I counted 
only those whose antecedents are not determined 



all 



'fem ) 



they won't 



by contraindexing constraints (Chomsky, 1981) 
other anaphors (e.g. tutte, 
be analyzed in this paper. 



4.2 Issues 



When applying centering to real text, one realizes 
that many issues have not been solved yet. I will 
comment here on how deictics, possessives, and 
subordinate clauses affect centering. 

Deictics such as /, you, etc. The problem is 
whether they are part of the Cf list or not. I 



follow (Walker, 1993) in assuming that deictics 
are always available as part of global focus, and 
therefore are outside centering. 

Possessives. Table || includes a category 
marked possessive, which refers to full NPs that 
include a possessive adjective referring to an ani- 
mate entity, such as i suoi sforzi — his efforts. 

The problem is how possessives affect Cb com- 
putation and Cf ordering. While Cb computation 
does not appear to be affected by a possessive, 
that behaves like a pronoun, the Cf ranking needs 
to be modified. An NP of type possessYve refers 
to two entities, the possessor Por and the pos- 
sessed Ped- Ped corresponds to the full NP, and 
thus its position in Cf is determined by the NP's 
grammatical function; as regards Por, niy working 
heuristics is to rank it as immediately preceding 
Ped if Ped is inanimate, as immediately following 
Ped if Ped is animate. Such heuristics appears to 
work, but needs to be rigorously tested. 



Subordinates. Another important issue, that 
has not been extensively addressed yet — but see 
(Kameyama, 1997; 3uri and McCoy, 1993) — is 
how to deal with complex sentences that include 
coordinates and subordinates. The questions that 
arise concern whether there are independent Cb's 
and Cf lists for every clause; if not, how the Cb 
of the complex sentence is computed, and how se- 
mantic entities appearing in different clauses are 
ordered on the global Cf list. 

In this paper, I will loosely adopt Kameyama's 
proposal (1997) that sentences containing con- 
juncts and tensed adjuncts are broken down into a 
linear sequence of centering "units" , while tense- 
less adjuncts don't generate independent center- 
ing units|^ 



■'The situation for complements is more compli- 
cated, and space prevents me from discussing it. 



Text 



,1989) 



Total 



Full NPs 



"45 (TTJ" 

(0) 
(0) 
(1) 
(1) 
(6) 

(6) 
(4) 



Strong Zero Other 



r b'ailaci. 198 9) 



(Mila, 1993) 



(Nichctti. 19931 



(Hel Rimno. 1^)93) 



(Pagctti. 199! 



(La Nazione, 1994) 



111 
17 

8 
18 
40 
36 
22 
35 



6 
1 
7 
26 
28 
19 
27 



23 


36 


7 


2 


9 





2 


4 


1 





7 


4 


1 


13 





1 


6 


1 


3 








1 


5 


2 


33 


80 


15 



Total 



287 



159 (29) 



Table 2: Animate 3rd person subjects 



Type 


Total 


CONTINUE 


RETAIN 


SHIFT 


Cent-est 


Other 


zero 


80 


56 


4 


6 


12 


2 


strong 


33 


13 


4 


5 


10 


1 


NP 


81 


17 


11 


7 


44 


2 


poss. 


25 


11 


5 


1 


8 





Total 


219 


97 


24 


19 


74 


5 



Table 3: Distribution of centering transitions 



4.3 Centering Transitions 

Table || illustrates the distribution of referring 
expressions with respect to centering transitions. 
The number of full NPs in Table ^ is about half 
their number in Table |[ in fact, full NPs often 
introduce entities new to the discourse, in which 
case centering does not apply. 

Table || includes two columns that don't refer to 
centering transitions. The column labeled cent- 
est encodes referring expressions that don't re- 
fer to a member of Cf(Un-i), but to an entity 
available in the discourse. While such transitions 
do not belong to centering, that models how cen- 
ters change from one centering unit to the next, 
they constitute referential usages of pronouns that 
need to be explained. I call these transitions 

CENT-ESTAB, for CENTER ESTABLISHMENT, be- 
cause such references appear to establish the new 
center of local discourse. Finally, other includes 
e.g. expressions that build a set out of Cb(U„_-i) 
and some other entity, such as sia lui che sua 
moglie — both him and his wife. It is not clear 
how to deal with these constructions within the 
centering framework, and thus, I have left them 
unanalyzed for the time being. 

The results are as follows. Null subjects are, not 
surprisingly, the most frequently used expression 
— 58% — for CONTINUe's; the difference between 
null subjects and all the other referring expres- 
sions is also statistically significant (x^ = 33.760, 
p <0.001).H Vice versa, CONTINUe's account for 



70% of null subjects. However, even full NPs can 
be used for CONTINUe's. As regards non posses- 
sive full NPs, such usages account for 16% of CON- 
tinue's, and for 20% of those NPs. Also, 12% of 
CONTINUe's are encoded by means of possessive 
NP's, and vice versa, 41% of possessive NP's are 
used for continue's. 

The situation for retain 's and shift's is not 
very clear, as none of the four categories of refer- 
ring expressions is predominant. All these shift's 
are actually SMOOTH-SHIFt's, i.e., there are no 
rough-shift's at all. This is not surprising for 
null subjects, that are never used for rough- 



SHIFT (Turan, 1995), however it is puzzling for full 



NPs. Apparently the Italian writers I selected ad- 
here to the default ranking of transitions, in which 
rough-shift's are the least preferred. 

A significant difference in the usages of the four 
referring expressions regards CENT-est. In this 
case, full NP's are used 70% of the times, and 
the difference between full NP's, and all the other 
expressions is significant (x^ — 21.401, p <0.001). 

I will now focus on the contrast between zeros 
and strong pronouns, in order to assess the strate- 
gies proposed in (^). 

The first part of (||a) — null subjects used for 
CONTINUE — is strongly supported. Zeros are 
used 80% of the times, and there is a significant 
difference (x^ — 9.204, p < 0.01) between zeros 
and strong pronouns used in CONTINUE and zeros 



test results are reported here more as a source 



of suggestive evidence than as strong indicators, as 
the observations in the corpus, which come from only 
8 authors, are not totally independent. 



and strong pronouns used in all other transitions 
taken together. Thus, in its use of null subjects 
for CONTINUE, Italian behaves in the same way 



as languages as diverse as Japanese (Kameyama 
198^; [Walker et al., 1994| ; |Shima, 1995D and Turk- 



ish (Turan, 1995), (Turan, this volume). 

However, as the percentage of strong pronouns 
used for continue is not negligible, I set out to 
investigate which factors may affect such a choice. 
I analyzed the CONTINUe's in my corpus, and I 
did find that one relevant factor is the transi- 
tion preceding the continue in question. Ta- 
ble ^ shows the different possible transitions in 
U„, that precedes U„+i in which a continue oc- 
curs. The configuration in which a CONTINUE is 
preceded by a retain, which I call ret-CONT, dif- 
fers from the other two because of the constraint 
Cp(U„) 7^ Cb{U„) in the retain. This in a sense 
predicts that the center will shift: but in a ret- 
CONT such prediction is not fulfilled. As Table ^ 
shows, this has some consequences on the usage of 
null and strong pronouns. Compared to strong 





CONTINUE 


retain 


shift 




Cb„=Cb„_i 


Cb„=Cb„_i 


Cb„/Cb„_i 




Cp„=Cb„ 


Cp„/Cb„ 


Cp„=Cb„ 


Un + 1 




Cbn-|-i=Cb„ 

Cpn + l=Cb„ + l 





Table 4: Transitions preceding a CONTINUE 



Type 


Total 


CONT-CONT+ 


RET-CONT 






SHIFT-CONT 




zero 


56 


51 


5 


strong 


13 


7 


6 


Total 


69 


58 


11 



Table 5: Pronoun occurrences for ret-CONT 

pronouns, null subjects are used 87% of the times 
for CONT-CONT and SHIFT-CONT taken together 
and only 45% of the times for ret-CONT. More- 
over, if a zero is used in a continue, that con- 
tinue is ten times more likely to be a CONT-CONT 
or SHIFT-CONT than a ret-CONT; in contrast, if a 
strong pronoun is used in a continue, that CON- 
TINUE is as likely to be a CONT-CONT or a SHIFT- 
CONT as a ret-CONT.|^ These trends in usage 
are confirmed by a strongly significant difference 
between zeros and strong pronouns used in CONT- 
CONT plus SHIFT-CONT, and zeros and strong pro- 



nouns used in ret-cont (x^ = 10.910, p < 0.001). 
Fig. |l| presents two examples of ret-CONT, one in 
(4 c) realized with a strong pronoun, the second in 
(is) realized with a null subject. In the utterance 
preceding (^), Cb = Irais and Cf = [Irais]. 



(4a) $ Incomincero a ricondurre il suo pensiero 
sui suoi doveri chiedendole ogni giorno 
(I) will start to bring her thoughts back to her 
duties by asking her every day 
Cf : [Irais > I's thoughts, I's duties], 
Cb: Irais, continue 



(4b) come sta suo marito. 
how her husband is. 
Cf : [husband > Irais] , 



Cb: Irais, retain 



'^Also (Turan, 1995) independently noticed the ex- 
istence of ret-CONt's, and reports results similar to 
mine. 



(4c) Non e che lei gli voglia granche bene, 

It's not the case that she cares much about him 
Cf : [Irais > husband], Cb: Irais, continue 

(4d) perche lui non corre ad aprirle la porta 

because he doesn't run to open the door for her 
Cf : [husband > Irais], Cb: Irais, retain 

(4e) ogni volta che $ si aha per lasciare la stanza; 
whenever (she) gets up to leave the room. 
Cf : [Irais] , Cb:Irais, continue 

Figure 1: Examples of ret-CONT 

As far as retain's and shift's go, the numbers 
are both too small to draw any conclusion, and 
they don't seem to identify any preferred usage 
for strong pronouns, contrary to what claimed by 
(^); also in the case of CENT-est there doesn't 
seem to be any significant difference in usage. A 
topic for future work is to verify whether there are 
any factors affecting the choice between null and 
strong pronouns in these cases, especially because 
null subjects used for shift or for CENT-est some- 
times result in a slightly less coherent discourse. 

The second part of the claim, (^) — a null sub- 
ject can be used if U„ provides syntactic clues that 
force the null subject not to refer to Cb(U„_i) — 
is supported; however, given the small numbers 
(four retain's and six shift's) this conclusion 
can just be tentative. The most frequent clue is 
agreement in gender and / or number. 

5 Conclusions 

In this paper, I examined the referring functions 
that different types of subjects perform in Ital- 
ian within the centering framework. I built on 
the analysis presented in ( Di Eugcnio, 199C ), and 
extended it in several directions: first, I used a 



corpus of really occurring examples; second, I in- 
cluded phenomena such as possessives and sub- 
ordinate clauses; third, I refined the notion of 
CONTINUE by pointing out the peculiarity of RET- 
CONt's; fourth, I included full NPs; fifth, I illus- 
trated a type of pronominal usage, CENT-est, out- 
side the purview of centering. 

Future work includes further analysis of a some- 
what surprising finding from the current study, i.e. 
that NP's encoding continue's are not so rare. It 
is worth while to examine the data further, to see 
under which conditions a full NP is licensed to en- 
code a CONTINUE. I also want to collect more ret- 
CONt's, retain's, and smooth-shift's to refine 
the analysis presented in this paper. Finally, an- 
other topic of research is cent-est, even if it is 
outside the centering framework, and under what 
conditions zeros are used to encode it. 
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